X-VLNet for Text Video Retrieval: Multi-level interaction with Fine-Grained Intra- Inter- Modality Alignment

Published: